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Lumping a Markov process introduces a coarser level of description that is useful in many contexts 
and applications. The dynamics on the coarse grained states is often approximated by its Markovian 
component. In this letter we derive finite-time bounds on the error in this approximation. These 
results hold for non-reversible dynamics and for probabilistic mappings between microscopic and 
coarse grained states. 
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^ ' I. INTRODUCTION 

Oh: 

, Markov processes are a standard modeling tool used in many applications ranging from finance [T] and telecommu- 
\^ • nications [2] to physics fs^, chemistry biology [sj, and computer science @. For theoretical and practical reasons, it 
is often convenient to partition the state space into aggregates and to view the dynamics at a coarser level. The coarse 
graining operation bridges the gap between different level of descriptions by introducing "mesostates" each represent- 
ing many microstates. Analysis of experimental data naturally leads to coarse graining as observation techniques may 
QJ . not be able to resolve the set of microscopic states and only give access to mesostates. Similarly, disregarding the 
^ ■ environment or part of a system provides an effective description of a remaining (sub)system of interest. 
I , A deterministic mapping between micro and mesostates is too restrictive to be applicable in many problems of 
■ interest. The concept of coarse graining can be extended to include the case where the mapping is a probabilistic 
function of the microstates. The resulting model, called a hidden Markov model, is a doubly embedded stochastic 
process 0. Such processes are especially known for their application in temporal pattern recognition such as speech 
I and handwriting recognition P or bioinformatics 0. 
Ch ■ Although the dynamics on the aggregates is not Markovian in general [l3|, there is a natural choice for a Markov 
I ■ dynamics on the set of lumped states .10.-12.] . This dynamics reproduces the influence of the first past state on the 
■ O , transition probabilities and neglects the higher-order memory effects. This choice matches the time evolution of the 
C original unlumped state started at the stationary state. Furthermore, the probability transfers between aggregates 
match those arising from the original chain. 

In the present work we analyze the accuracy of such coarse grained models as compared to the exact microscopic 
behavior. This problem was first envisaged by Hoffman and Salamon in |12l] for the special case of deterministic 
coarse gaining and reversible Markov chains. Reversible Markov chains have transition matrices diagonally similar to 
' symmetric matrices, a strong symmetry property at the basis of their analysis. Here we generalize their approach to 
, the case of probabilistic coarse graining and non-reversible dynamics. We obtain bounds on the error made by using 
the lumped dynamics considered as a model of the unlumped dynamics. 
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II. LUMPED MARKOV CHAINS 

We consider a Markov chain characterized by a transition matrix G on the finite state space S. The probability 
distribution p — {pi,p2, ■ ■ ■ ,Pn) evolves in discrete time steps according to 

p{n+l)=p{n)G. (1) 

H ' We assume that the Markov chain is primitive, i.e., there exists an uq such that G"" has all positive entries. This 
- - -" guarantees that G has a unique stationary distribution tt such that 

tt^ttG. (2) 

Our goal is to analyze lumped dynamics. Let {ujj} G 57 be the set of mesostates. The mesostate uj is observed with 
probability bi{uj) when the system is in microstate i. We collect these conditional probability distributions into the 
matrix G with elements 

C,^=b,{u). (3) 

The matrix G serves to specify the lumped probability distribution p — pC on Q, corresponding to a distribution p on 
S. We also introduce the matrix D with elements 
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The element D^^i is the conditional probability to be in state i given the observation ui. In the case of a deterministic 
association between microstates and aggregates, the operators C and D reduce to the operators introduced in [l^ . 
Their successive action defines a stochastic operator CD that satisfies 

w = -kCD. (5) 

Following we now introduce the lumped dynamics with transition matrix 

G^DGC. (6) 

This matrix is stochastic, G > and X^u' ^^t^' ^ 1- This choice of the transition matrix insures that the distribution 
TT = irC is the stationary distribution of G: 

ttG = ttCDGC = ttGC = ttC = tt . (7) 

By construction, the dynamics G also preserves the probability fluxes between states in the coarse grained description. 
Precisely, the dynamics G arises from the Markovian approximation of a stationary sequence of observed mesostates. 
G is the unique Markov chain on that satisfies this condition. 



III. BOUNDING COARSE GRAINING ERRORS 



Starting from a distribution pq on E, its time evolution among the aggregates with the unlumped dynamics is 
PqG^G, while its time evolution with the lumped dynamics is poCG". The main question considered here is how 
different these two dynamics can be. To address this question, we define the norm \\v\\^ — \\vU~^\\.^, where 1/-^ = 
diag(-y/7ri, y^tt^, . . . , ^ttjv) and ||-||2 is the 2-norm. The corresponding operator norm is 

\\A\l^\\U^AU-'\\^. (8) 
The difference between the two probability distributions after n time steps can be expressed as 

|7>oCG"-poG"c||^ = \\poC{DGCr -poG^CW^ 
= \\po{CDGrC-poG"C\\^ 

= WpoiiCDGr-G^CW^ ■ (9) 

As emphasized in ^1^. we observe the proeminent role of the operator CDG = H, which specifics a dynamics on the 
original state space S. 

Our goal will be to bound the n-step difference — G"||^. The n-step difference can transiently grow, but must 
eventually decline to zero as, by construction, the lumped and the unlumped chain converge to the same stationary 
distribution. 

We use of the fact that H and G have the common stationary distribution ir. We define the projection operator 
Ptt = u^TT, where u is the vector (1, 1, . . . , 1) £ R^. The complementary projection = I — P^r- We end up with 
the following representation of G and H: 

G = {P^ + P,)G = + P^G (10) 

and 

H = {P,+P,)H = P, + P,H. (11) 

From (Uni) and ^ we obtain H -G = P^H - P„G, and P^P^G = P^GP^ = P^P^H = P^HP^ ^ 0. 

The norms ||-P(T-ff|l7r and ||PcrG||^ will play a crucial role in our analysis. In this regard, we prove the following 
theorem. 

Theorem 1. Let G be a transition probability matrix, and let P„ the projection operator introduced above. Then 

\\P.G\\^<1- (12) 
Furthermore, \\PaG\\^ = (72, where U2 is the second-largest singular value of Ut^GU~^ . 
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PROOF OF THEOREM 1. We will use the following result fl§\. Let A be a matrix with nonnegative entries and 
spectral radius p{A), and suppose that there exist left and right positive Perron eigenvectors v and w, respectively. 

Then p{A) ^ \\XAX^^\\^, where X = diag(uj/^u;7^/^). 

In our case this translates into \\G\\^ — \\UjrGU^^\\^ — p{G) — 1. Because the 2-norm of a matrix A is given by its 
dominant singular value, we deduce that the dominant singular value ui of Ut^GU^^ equals 1. Equivalently, we have 
that the dominant eigenvalue of {U.,rGU~^){UT^GU^^)'^ is ai — 1. 

We now turn to the norm UPg-GH^. Note that P„G = {G — P-n)'u^ — 0, so that P„G has negative elements and 
the above construction cannot be applied. Introducing the notation Ut^AU^^ = A, the norm ||PctG||^ = ||PctG'||2. It 
is thus given by the largest eigenvalue of PcrGG^ Pj . 

First, we consider the projection operator P-^. A direct calculation shows that Pn- is symmetrized by U"-^. It 
follows that Pa = I— Pt: is symmetric as well. Now, because G and P^ commute, G and Pa commute with each other. 
Accordingly, PaGG^Pj — PaGG^ . Furthermore, there exists an eigenbasis such that the eigenvalues of PaGG^ take 
the form ai/3i, where ai and (3i are the eigenvalues of Pa and GG"^ , respectively. Because Pa is a projection operator, 
it has {N — 1) eigenvalues 1 and one eigenvalue 0. The latter corresponds to the right eigenvector 

The vector is a right eigenvector of GG^ with eigenvalue 1. From cri = 1, we have that 1 is the dominant 
eigenvalue of GG^ . Therefore, the eigenvalues of PaGG^ are given by the eigenvalues of GG^ , except for its dom- 
inant eigenvalue 1 that is replaced by 0. In particular, ||PctG||^ is given by the dominant eigenvalue of PaGG^ or, 
equivalently, by the second-largest singular value (72 of G = Ut^GU~^. 

We now have to prove that (72 < 1- We already know that (T2 < cti = 1, but we now show that the inequality is 
strict. This follows from the Perron- Frobenius theorem applied to GG^ . Indeed, because G is a primitive transition 
matrix, GG^ is nonnegative and primitive. Recalling that the norm of an operator is always greater or equal to its 
spectral radius, p{A) < \\A\\, we arrive at 

IA2I < ||P.G||, = ^2 < 1, (13) 
where A2 is the second- largest (in modulus) eigenvalue of G. □ 

Because H is a transition matrix, we deduce from Theorem 1 that H-Po-^^llTr = V2 < ^, with 772 the second-largest 
singular value of Ut;HU~^ . Furthermore, we have 

\\PaH\\^ = \\CDG-P^\\^ = \\CDG-GDP^\\^ 

= ||Gi?(G-P.)||,<||GZ?||J|P,G||,. (14) 

Noting that GD is stochastic and that the similarity transform U^^ symmetrizes GD, we conclude that HGI?!!^ — 1. 
This leads to 

\\P.H\\^<\\PaG\l or m<a2. (15) 

We are now in position to derive our first bound. The following theorem bounds the n-step difference — G"||^ 
in terms of the one-step difference \\H — G\\^. 

Theorem 2. Let G he a transition probability matrix, and let H = GDG with G and D as introduced above. Define 

5=\\H~G\\,. (16) 

Then 

\\H'" -G''\\^<5 K'{n)<5 K{n), (17) 

where K'{n) = (cij ^ 112) /('^^ ^ V2) o,nd K{n) — na2^^ . Here U2 and 772 are the second-largest singular values of 
UttGU^^ and Ut^HU^^, respectively. 

PROOF OF THEOREM 2. We start by expressing the n-step difference in terms of the one-step difference as in 
fil. We note that 



|ff"-G"||^ - \\{H - G)H'''-' + G{H''-' - G''-')\\^ . (18) 



Iterating, we find 



|iJ"-G"' 



n-l 



J2g''{h-g)h"-''-'^ 

t-1 

< ^ ||G'=(iJ-G)iJ"-'=-i||^ . (19) 



fc=0 
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Then for integers fc,n, 

\\G''{H-G)H''-''-^\\^ = \\(P^ + P^G)^(P^H - P„G){P^ + PaH)''-''-^\\^ 
= \\{P^G)''{P^H - P„G){P^Hr-''-^\l 

< WP^GtjWP^HWl-"-^ . (20) 

Combining dH]) and (HOI) we find 

n-l 

||i/"-G"L < SY^WP.GWlWP.HC-"-' . (21) 
Carrying out the summation in (|2ip and using Theorem 1 we obtain the bound 

n _ n 

\\H"-G"\\^<S K'{n)=^6-^ -■ (22) 

0-2 - 



This expression can be bound from above by combining pip and (|14p to give 

K'{n) < K{n) = na"^'^ , (23) 

with equahty when 7]2 — (^2- D 

We now derive a bound independent of the one-step difference — G||^. 

Theorem 3. Let G he a transition probability matrix, and let H = CDG with C and D as introduced above. Then 

||ff"-G"IU<%" + ff2"<2cT^ (24) 
where and 772 are the second-largest singular values of Ut^GU^^ and Ut^HU~^ , respectively. 

PROOF OF THEOREM 3. We observe that G" = (P^ + PMT = P-n + {PaGy and i/" = (P^ + P^i?)" = 
Pit + (PaH)" . Hcuce we have 

\\H--G-\i = \\(P,Hr-{p^Gr\i 

< mHrw^ + mGrii 

< \\P.HC + \\P,G\\:. (25) 
From Theorem 1 and || Per < ||PctG||^ we deduce the inequahties (|24l) . □ 

We end this section with the following remarks. The bound 2t72 from Theorem 3 is independent of S and 772. 
Accordingly, it is valid for any probabilistic coarse graining of the original chain. This bound also shows that the 
coarse graining error decreases exponentially in time. Relation reveals that the fastest possible decay rate of the 
bound is IA2I. We show in the next section that this rate is achieved for special classes of Markov dynamics. 

IV. SPECIAL MARKOV CHAINS 

A. Reversible Markov chains 

A Markov chain is reversible if it obeys the detailed balance conditions 

T^iGij = TTjGji yi,j e S . (26) 

This corresponds to an equilibrium situation where no probability currents are present in the stationary state. We 
note that in this case G is also reversible. Under the detailed balance conditions the operator UTrPaGU^^ is 
symmetric. This strong symmetry property of reversible Markov chains is at the basis of Hoffman and Salamon's 
analysis. Accordingly we have 

\\PaG\\^ = \X2\, (27) 



where A2 is the second-largest eigenvalue of G. In this way we recover the bound K{n) — n|A2|" of [12| 
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B. Doubly stochastic matrices 



Doubly stochastic matrices are characterized by the foUowing property: 



(28) 



i.e., both their rows and columns sum to one. This implies that the stationary distribution is uniform, tt — 
[l/N,--- ,1/N). In particular, we have that A = UTrAU~^ for any operator A. Doubly stochastic matrices do 
not necessarily satisfy the reversibility conditions (PSI) . Notably, the class of doubly stochastic matrices coincides with 
the class of normal stochastic matrices . Normal matrices commute with their transpose and have their eigenvalues 
as singular values. Noting that P„G is normal if G is, we conclude that UPcrGH^ ||PcrG||2 = cr2 = IA2I, yielding 



K{n) 



(29) 



V. CONTINUOUS-TIME MARKOV PROCESSES 



The previous construction can be extended to continuous-time Markov processes using the concept of uniformization 

MM- 

The probability distribution p{t) now obeys the dynamics 



dp(t) 



At 



= p{t)L, 



(30) 



Tlij^jLi ^ij- Note that the rate matrix has negative elements. We 



with the rate matrix Lij > for i ^ j, and La = 
assume it has a unique stationary distribution tt such that — irL 

We define the matrices C and D as above. The lumped dynamics L = DLC is verified to be a rate matrix: L, 
for w 7^ w' and L„„ = - Y^uj^uj ■ 

Starting from a distribution po j the difference between the two dynamics after a time t reads 



> 



PoGe*^-poe'^C 



tL, 



Po c 



tH 



)c|| 



(31) 



where we defined the operator CDL = H . 

We now introduce the transition matrix r(/3) — I + L/ (3, where / is the unity operator and (3 the uniformization 
parameter To ensure that T(/3) is a proper transition matrix, j3 must satisfy j3 > max^ \Lii\. The distribution 

TT is also the stationary distribution of T{/3), 7rT(/3) = ir{I + L/(3) = tt, for all /3. 

We thus have 



,ti3{T-r) 



|g-t/3P„gt/3P„T| 



< 



< Q-m\PA^Qm\p^T\\^ 



(32) 



In the second line we used that Pa-{T — I) = T — I, in the third line that commutes with P^T, and in the last 
equality that H^^crlljr = 1- Here (T2{(3) < 1 is the second-largest singular value of UT^T{j3)U^^ . 

We have < IIGZ?!!^ II-^IU ^ ll-^llrr' fro"^ which we deduce exp (t ||7?||^) < exp (t ||L||^). Taking into account 

([5^ we obtain the bound 



|e*^-e*^| 



< 2e-''*[i-'^2(/3)] ^ 



(33) 



which decreases exponentially in time. This bound can be further optimized by minimizing over the uniformization 
parameter f5. 



6 



VI. CONCLUSIONS 



We derived quantitative bounds on the error made by using a lumped Markov process instead of the unlumped 
dynamics. Notably, the deviations between the two levels of description can be uniformly bounded in terms of their 
deviation in one time step. The bounds are expressed in terms of the second-largest singular values of the transition 
probability matrices. These results generalize the work by Hoffman and Salamon [l^ for reversible Markov chains 
and deterministic coarse graining. Our construction holds for discrete- and continuous-time, and for non-reversible 
processes and probabilistic coarse graining. The important finding is that our bounds hold for all time and are not 
just asymptotic. 

The main technique making our bounds possible consisted in the use of a carefully chosen operator norm. Exploiting 
the fact that transition matrices are nonnegative, we find a norm that equals the dominant singular value. On the 
other hand, important observables such as the statistics of current fiuctuations are described in terms of generalized 
transition operators that are nonnegative but non-stochastic [l6j . As our approach relies on the fact that the singular 
values of stochastic matrices are lower than one, it is not clear how to extend our arguments to these non-stochastic 
operators. In addition, the impact of coarse graining on observables nonlinear in the probability distribution such as 
the entropy production remains to be investigated |17l |. 

Other dynamics on the aggregates that are consistent with the stationary state tt can be defined. These dynamics 
can satisfy further requirements, such as that the net probability transfers between aggregates match those derived 
from the unlumped chain. The "gauge" freedom available in choosing the dynamics might be used to minimize the 
coarse graining error while preserving relevant dynamical features. 
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